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Preface 



We are delighted to present the proceedings of the 7th IFIP/IEEE International 
Conference on Management of Multimedia Networks & Services (MMNS). 

The MMNS 2004 conference was held in San Diego, California, USA on October 
4-6, 2004. As in previous years, the conference brought together an international 
audience of researchers and scientists from industry and academia who are re- 
searching and developing state-of-the-art management systems, while creating a 
public venue for results dissemination and intellectual collaboration. 

This year marked a challenging chapter in the advancement of management sys- 
tems for the wider management research community, with the growing complex- 
ities of the Internet, the proliferation of alternative wireless networks and mobile 
services, intelligent and high-speed networks, scalable multimedia services, and 
the convergence of computing and communications for data and voice delivery. 
Contributions from the research community met this challenge with 84 paper 
submissions; 26 selected high-quality papers were subsequently selected to form 
the MMNS 2004 technical program. The diverse topics in this year’s program 
included novel protocols in wireless systems, multimedia over wireless, mobility 
management, multimedia service control, proactive techniques for QoS manage- 
ment, MPLS traffic engineering and resiliency, distributed systems management, 
scalable multimedia systems, and adaptive methods for streaming multimedia. 
The conference chairs would first like to thank all those authors who contributed 
to an outstanding MMNS 2004 technical program, second the Program Commit- 
tee and Organizing Committee chairs for their support throughout the develop- 
ment of the program and conference, third the worldwide experts who assisted 
in a rigorous review process, and fourth the sponsors Intel Corporation, IFIP 
and IEEE, without whose support we would not have had such a professional 
conference. Last and certainly not least, we express grateful thanks to Marie 
Dudek who was instrumental in helping to ensure a top-quality MMNS 2004. 

We truly feel that this year’s proceedings mark another significant point in the 
development of MMNS as a primary venue for the advancement of network and 
service management, and also novel architectures and designs in technology and 
network services, to enable multimedia proliferation. 
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Abstract. The increase in the bandwidth of wireless channels and the comput- 
ing power of mobile devices increase the interest in video communications over 
wireless networks. However, the high error rate and the rapidly changing qual- 
ity of the radio channels can be devastating for the transport of compressed 
video. In motion compensated coding, errors due to packet losses are propa- 
gated from reference frames to dependant frames causing lasting visual effects. 
In addition, the bounded playout delay for interactive video limits the effective- 
ness of retransmission-based error control. In this paper, we propose a mecha- 
nism that combines retransmission-based error control with path diversity in 
wireless networks, to provide different levels of protection to packets according 
to their importance to the reconstructed video quality. We evaluated the effec- 
tiveness of the mechanism under different network conditions. Simulation re- 
sults show that the mechanism is able to maintain the video quality under dif- 
ferent loss rates, with less overhead compared to error control techniques that 
depend on reference frame updates. 



1 Introduction 

The increase in the bandwidth of wireless channels and the computing power of mo- 
bile devices increase the interest in video communications over mobile wireless net- 
works. However, in such networks there is no end-to-end guaranteed Quality of Ser- 
vice (QoS) and packets may be discarded due to bit errors. Wireless channels provide 
error rates that are typically around I 0" 2 , which range from single bit errors to burst 
errors or even intermittent loss of the connection. The high error rates are due to 
multi-path fading, which characterizes radio channels, while the loss of the connec- 
tion can be due to the mobility in such networks. In addition, designing the wireless 
communication system to mitigate these effects can be complicated by the rapidly 
changing quality of the radio channel. 

The effect of the high error rates in wireless channels can be devastating for the 
transport of compressed video. Video standards, such as MPEG and H.263, use mo- 
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tion-compensated prediction to exploit the redundancy between successive frames of 
a video sequence [1]. Although motion-compensated prediction can achieve high 
compression efficiency, it is not designed for transmission over lossy channels. In this 
coding scheme the video sequence consists of two types of video frames: intra-frames 
(I-frames) and inter-frames (P- or B-frames). I-frame is encoded by only removing 
spatial redundancy present in the frame. P-frame is encoded through motion estima- 
tion using preceding I- or P-frame as a reference frame. B-frame is encoded bi- 
directionally using the preceding and succeeding reference frames. This poses a se- 
vere problem, namely error propagation (or error spread), where errors due to packet 
loss in a reference frame propagate to all of the dependent frames leading to percepti- 
ble visual artifacts that can be long-lasting. 

Different approaches have been proposed to tackle the error propagation problem. 
One approach is to reduce the time between intra-coded frames, in the extreme case to 
a single frame. Unfortunately, I-frames typically require several times more bits than 
P- or B-frames. While this is acceptable for high bit-rate applications, or even neces- 
sary for broadcasting, where many receivers need to resynchronize at random times, 
the use of the intra-coding mode should be restricted as much as possible in low bit 
rate point-to-point transmission, as typical for wireless networks. The widely varying 
error conditions in wireless channels limit the effectiveness of classic Forward Error 
Correction (FEC), since a worst-case design would lead to a prohibitive amount of 
redundancy. Closed-loop error control techniques like retransmission have been 
shown to be more effective than FEC and successfully applied to wireless video 
transmission. But for interactive video applications, the playout delay at the receiver 
is limited, which limits the number of admissible retransmissions [2]. 

In this paper, we propose a mechanism to provide error resilience to interactive 
video applications in wireless networks. The mechanism extends retransmission- 
based error control with redundant retransmissions on diverse paths between the 
sender and receiver. The mechanism factors in the importance of the packets as well 
as the end-to-end latency constraints to minimize the overhead and maximize the 
quality at the receiver. Our simulation results indicate that the proposed mechanism 
performs significantly better than reference frame update schemes in terms of per- 
ceived quality measured at the receiver as well as the transmission overhead. 

This paper is organized as follows. Section 2 provides a review for related works. 
The proposed mechanism is presented in Section 3. Section 4 discusses the mecha- 
nism implementation. Section 5 presents experiments that we performed to examine 
the proposed mechanism and to compare it to reference frame update error control 
mechanism. Finally, conclusions are outlined in Section 6. 



2 Related Work 

Analysis for the effects of packet loss on the quality of MPEG-4 video is presented in 
reference [3], which also proposes a model to explain these effects. The model shows 
that errors in reference frames are more detrimental than those in dependant frames, 
due to propagation of errors, and therefore reference frames should be given a higher 
level of protection. 

Forward error correction (FEC) has been proposed to provide error recovery for 
video packets by adding redundant information to the compressed video bit-stream so 
that the original video can be reconstructed in presence of packet loss. Reference [4], 
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presents Priority Encoding Transmission (PET) where different segments of video 
data are protected with redundant information according to their priority, so that in- 
formation with higher priority can have a higher chance of correct reception. Typical 
FEC schemes are stationary and must be implemented to guarantee a certain QoS 
requirement for the worst-case channel characteristics. Due to the fact that wireless 
channel is non-stationary, and the channel bit error rate varies over time, FEC tech- 
niques are associated with unnecessary overhead that reduces the throughput when the 
channel is relatively error free. 

Unlike FEC, which adds redundancy regardless of correct receipt or loss, reference 
[5] proposes retransmission-based error control schemes, such as Automatic Repeat 
Request (ARQ), for real time data. Retransmission-based schemes resend only the 
packets that are lost, thus they are adaptive to varying loss characteristics, resulting in 
efficient use of network resources. However, retransmission schemes are limited by 
the receiver’s playout delay, as well as the Round Trip Time ( RTT ). Reference [6] 
presents Time-Lined TCP (TLTCP), which extends the TCP retransmission to support 
time-lines. Instead of treating all data as a byte stream TLTCP allows the application 
to associate data with deadlines. 

An overview on different error concealment mechanisms proposed to minimize the 
visible distortion of the video due to packet loss is presented in [7], Error concealment 
techniques depend on the smoothness property of the images as well as that the hu- 
man eye can tolerate distortion in high frequency components than in low frequency 
components. Reference [2] shows that detectable artifacts can still exist after the error 
concealment, and that the degree of these artifacts depends on the amount of lost data, 
the type of the stream and the effectiveness of the concealment algorithm. High- 
quality concealment algorithms require substantial additional computation complex- 
ity, which is acceptable for decoding still images but not tolerable in decoding real- 
time video. In addition, the effectiveness of concealment depends on the amount and 
correct interpretation of received data, thus concealment becomes much harder with 
the bursty losses in wireless channels. 

Error-resilient encoding, such as Multiple Description Coding (MDC) and Layered 
Coding (LC), are proposed to combat channel-induced impairments. MDC generates 
multiple equally important, and independent substreams, also called descriptions [8], 
Each description can be independently decoded and is of equal importance in terms of 
quality, i.e. there is no decoding dependency between any two of the descriptions. 
When the decoder receives more descriptions, the quality can be gradually increased 
no matter which description is received. LC generates one base-layer bitstream and 
several enhancement-layer bitstreams [9], The base-layer can be decoded to provide a 
basic video quality while the enhancement-layers are mainly used to refine the quality 
of the video that is reconstructed from the base-layer. If the base-layer is corrupted, 
the enhancement-layers become useless, even if they are received perfectly. 



3 Prioritized Retransmission over Diverse Paths 

The ability to successfully decode a compressed bitstream with inter-frame dependen- 
cies depends heavily on the receipt of reference frames, and to a lesser degree on 
dependent frames. Thus, we propose a mechanism to provide adaptive end-to-end 
unequal error protection for packets belonging to different frames, without sacrificing 
the timely-delivery requirement for interactive video. We achieve the unequal error 
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protection through redundant retransmissions over diverse paths between the sender 
and receiver, based on the importance of the packets. There are several ways to set up 
multiple diverse paths in a wireless network. In single hop wireless network a mobile 
node would need to establish channels to multiple base stations. In a multi-hop wire- 
less network, routing protocols can utilize the mesh structure of the network to pro- 
vide multiple loop-free and maximally disjoint paths. Due to the statistical independ- 
ence of the packet loss events over different paths, by re-transmitting the packets over 
separate paths, we are maximizing the probability that at least one packet is received 
error-free, in least number of retransmissions. With a network loss rate /, the error rate 
can be reduced to 

Error Rate =/ (1) 

where L is the maximum number of retransmission trials, which is typically deter- 
mined by the initial playout delay in the receiver as well as the round-trip delay. M- is 
the number of retransmission copies during the i th retransmission, which depends on 
the importance of the retransmitted data to the reconstructed video quality. The 
maximum number of copies MAX(A/J is equal to the number of available paths be- 
tween the sender and receiver. 

The scheme is adaptive in the sense that the retransmission overhead will only be 
added when there is loss in the stream, and the degree of the overhead is proportional 
to the importance of the lost packets. To ensure in-time delivery of retransmitted 
packets, and to prevent retransmitting expired packets, the retransmission is con- 
trolled by the packet lifetime, as well as estimate(s) of the path delays. 

The priority for each data unit in the stream is determined by the application. Thus 
in the context of motion compensated coding, the application can assign higher prior- 
ity for I-frames data, than P- or B- frames data. Also P-frames might be assigned 
varying priority levels, since P-frames that are closer to the preceding I-frame are 
more valuable for preserving picture quality than later P-frames in the group of pic- 
tures (GOP). The prioritization scheme can also be applied on the macroblock basis in 
coding schemes which provides the encoder with the flexibility to select the coding 
mode, i.e. intra or inter coding, on the macroblock level [10]. 

4 Implementation 

We implemented the mechanism as a sub-layer above Real Time Protocol (RTP) [11], 
Fig. 1 shows the system architecture. We refer to this sub-layer as Multiple Path-RTP 
(MP-RTP). 

MP-RTP is responsible for: 

1 . Maintaining the reliability level and the lifetime for each packet, as well as im- 
plementing delay constrained retransmission, 

2. Monitoring the status of the available paths, and selecting the suitable path(s) for 
packet retransmission. 

For each video frame, the sending application assigns a priority level, which is 
based on the frame’s importance to the reconstructed video quality. I-frames are as- 
signed higher reliability level than P- or B- frames. Also P-frames are assigned vary- 
ing reliability levels based on their location in the GOP. In addition, the sending ap- 
plication calculates the lifetime for each video frame N, TJN), as follows: 
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Fig. 1 . System architecture. 

T l (N) = T r (N) + D s (2) 

where T R (N) is an estimate for the rendering time of frame N at the receiver, and D s is 
a slack term to compensate the inaccuracies in estimating the One-Way-Delay ( OWD ) 
from the sender to the receiver, as will be discussed later, as well as the receiver’s 
processing delay. Assuming that there is no compression and/or expansion of total 
display time at the receiver, the rendering time for frame N, T R (N), is calculated as 
follows: 



T r (N) = T 0 +T d + N/R (3) 

where T 0 i s the video session initiation time, T D is the receiver’s playout delay, which 
determines the rendering time for the first frame in the sequence. Playout delay can be 
obtained from the receiver during the session initiation. R is the frame rate. As the 
MP-RTP sub-layer receives a frame it fragments it, if required, into multiple packets, 
then RTP headers are added and the packets are sent to the receiver. In addition, a 
copy of each packet is kept in a retransmission buffer, along with its lifetime and 
priority. Typically, all the packets within one frame will have the same lifetime and 
priority. MP-RTP clears packets from the retransmission buffer, as it receives the 
Real Time Control Protocol-Receiver Reports (RTCP-RR), which are sent regularly 
from the receiver, indicating the highest sequence number received, as well as other 
information regarding the quality of the received stream [11]. Initially, packets are 
sent on a primary path with the receiver, selected by the sender during the session 
initiation. 

The MP-RTP at the receiver is responsible for sending retransmission requests to 
the sender as soon as it detects a missing packet. The format of the retransmission 
request, shown in Fig. 2, is similar to RTCP-RR [11], except that it is extended to 
include the 32 bits sequence number of the missing packet. As the retransmission 
request is susceptible to losses, the MP-RTP retransmits these reports on different 
paths to the sender. 

MP-RTP uses Heartbeat packets, shown in Fig. 3. a, to maintain an estimate for the 
RTT of the available paths. The RTT estimate is an exponential average of current and 
past RTT measurements. Each heartbeat packet includes a time stamp indicating the 
transmission time. The MP-RTP at the receiver responds to the heartbeat packet by 
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Fig. 2. Extended RTCP-RR to include the missing sequence number. 

sending a Heartbeat- Acknowledgment packet, shown in Fig. 3.b, on the same path 
from which the heartbeat was received. The heartbeat-acknowledgement includes a 
copy of the timestamp in the corresponding heartbeat packet. The RTT estimates are 
used to obtain an approximation for the paths OWD, i.e., OWD ~ RTT / 2. The appli- 
cation can compensate the inaccuracies in the OWD approximation as it assigns the 
frames lifetime, as shown in equation 2. In addition, MP-RTP uses the RTT estimates 
to switch the primary path, which can break due to the mobility in the wireless net- 
work. To minimize the interruption for the interactive video session, as the primary 
path RTT increases beyond a certain threshold, MP-RTP sets the alternative path with 
the shortest RTT to be the primary path. The switching threshold can be based on the 
maximum delay allowed for the interactive video application. Currently, we are using 
a fixed value for the switching threshold. In future work, we are planning to investi- 
gate techniques to dynamically adapt the value of the switching threshold. 



0 

012 


1 2 3 

34567890123456789012345678901 


Verson 


Padding 


Priyted Tyne^lC 


Length 


time stamp 



(a) 



0 12 3 

01234567890123456789012345678901 



Version 


Padding 


PaytCkM Type=MB-AO< 


Length 


time stamp 



(b) 



Fig. 3. (a) Heartbeat packet (b) Heartbeat acknowledgement packet. 



As soon as the sender receives a retransmission request, it performs the following 
algorithm: 

1 . If the lost packet has a low priority, go to step 2, otherwise go to step 3 

2. Check the round trip time estimate RTT i for all the available paths, maintained 
using heartbeat packets. Select the retransmission path i with the minimum OWD,, 
such that the following condition holds: 

T c + OWD, < T L (j) (4) 

where T c is the current time at the sender and T L (j) is the lifetime for frame j, to 
which the retransmitted packet belongs. 
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3. For high priority packets, the sender selects all the available path(s) that satisfies 
condition 4, and retransmits the packet on these paths simultaneously. 

By controlling the retransmission through the frames lifetime, as well as esti- 
mate^) of the path(s) delay, MP-RTP prevents retransmission of expired packets 
while trying to meet the frames lifetime constraint. If no path(s) is suitable in step 2 or 
3, the retransmission is discarded, as the packet will not be received before the render- 
ing time for the frame to which it belongs. At the same time the upper layer applica- 
tion is notified about the dropped packet to allow the encoder to utilize schemes, such 
as error tracking, to limit the error propagation [2], 

5 Performance Analysis 

In order to examine the performance of the proposed mechanism, we implemented the 
mechanism in OPNET simulation and modeling tool [12]. We simulated a Multi Path 
Transport (MPT) system, with configurable number of single hop paths between the 
sender and receiver. For simplicity we assumed that the paths are identical in terms of 
available bandwidth, equal 2.0 Mbps. A two-state model Markov model, shown in 
Fig. 4, is used to simulate the bursty packet loss behavior in wireless channels [13]. 




Fig. 4. A two-state Markov model to simulate burst packet losses. 

The two state model, which is often referred to as Gilbert channel model, has been 
shown to be able to effectively capture the bursty packet loss behavior of the wireless 
channels. The two states of this model are denoted as Good (G) and Bad ( B ). In state 
G, packets are received correctly whereas, in state B, packets are assumed to be lost. 
This model can be described by the transition probabilities p from state G to B and q 
from state B to G. The average packet loss rate ( PLR ) is: 

Average PLR = E ^ 

p + q 

We vary the error characteristics for channel i by appropriately controlling the 
channel Good and Bad durations, according to an exponential distributions with aver- 
ages p t and q., respectively. Delay for channel i is modeled by an exponential distribu- 
tion with the mean delay D j = 30 msec. We set the path maximum transfer unit 
(MTU) of 400 bytes for all the paths. The heartbeat interval is set to 150 msec. 

To generate the video sequence used in our simulation, we used open source XviD 
MPEG-4 compliant video codec [14]. Sixty seconds of a high motion video sequence 
(football match) are encoded at 15 frames per second (fps), which results in a se- 
quence of 900 frames. The frame resolution is quarter common intermediate format 
(QCIF, 176 x 144 pixels), which is the most common format at low bit rates, and the 
coding rate is 200 Kbps. We repeated our experiments with limited motion video 
sequence (TV news) and we get similar results to that shown here. We limited the 
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playout delay at the receiver to 100 msec., to represent an interactive video applica- 
tion. We set the switching threshold, discussed in Section 4, to 200 msec. We selected 
this value because given the channel delays and the playout delay at the receiver, 
having the RTT of the primary path higher than this threshold will result in all frames 
arriving later than their rendering time at the receiver and will be discarded. 

The average Peak Signal to Noise Ratio (PSNR) is used as a distortion measure of 
objective quality. PSNR is an indicator of picture quality that is derived from the root 
mean squared error (RMSE). Without transmission losses, the average PSNR of the 
decoded frames for the video sequence used in our performance study is 27 dB. 

After obtaining a transmission trace of a video sequence, we run the decoder on the 
trace to measure the image distortion due to packet losses, using the PSNR. In order 
to generate statistically meaningful quality measures, for each simulation scenario we 
repeated the experiment ten times with different seeds. The presented PSNR values 
are the average of the ten experiments. 

In our performance study we set the application to choose I-frames and half of the 
P-frames starting from the I-frame in a GOP to be high priority frames, while other 
frames are set to low priority frames. 

5.1 Effect of Packet Loss Rate on Video Quality 

We tested MP-RTP using two diverse paths, namely path 0 and path 1, between the 
sender and the receiver. Path 0 was selected as the primary path during the video 
session initiation. The channel average packet loss rates for path 0 and path 1 were set 
to 0.2 and 0.1 respectively. We set the encoder so that the I-frame update period, i.e. 
interval between two consecutive 1-frames, equal 3 seconds. Fig. 5 shows the PSNR 
for each frame in the video sequence. For comparison we repeated the experiment 
using the retransmission scheme with single path retransmissions, where missing 
packets are retransmitted on a single path selected randomly from the paths between 
the sender and receiver. As can be shown from the figure that the redundant retrans- 
mission scheme is able to maintain the video quality, at high packet loss rates. On the 
other hand, with the single path retransmission scheme, the video quality can be 
dropped for long durations due to loss of packets in reference frames, and under the 
high loss rate retransmitted packets can also be lost, leading to error propagation in 
the following dependent frames up to the next I-frame. Although the sender can keep 
retransmitting the packet, the receiver will discard these retransmissions, as they ar- 
rive after the frame rendering time. 

Fig. 6, shows the average PSNR over the whole sequence versus different channel 
average packet loss rates for the primary path, i.e. path 0. The channel average packet 
loss rate for path 1 is set to 0.1. We repeated the same experiment with different I- 
frame update periods. For our mechanism we used an I-frame update period equal 3 
seconds. As can be seen in Fig. 6, the single path retransmission scheme achieves a 
similar performance to MP-RTP only when the I-frame frequency is increased more 
than three times to one every 15 frames. As the I-frames have larger sizes than P- and 
B-frames, increasing the I-frame frequency for the same bit rate translates to reduced 
video quality since bits are now wasted to code I-frames. If the I-frame frequency is 
set to one in 45 frames for the single path case, it can be seen that the quality deterio- 
rates rapidly. 
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Fig. 5. PSNR versus frame number. 
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Fig. 6. Average PSNR versus average packet loss rate. 

Again this is mostly due to losses in reference frames, as a result of the high packet 
loss rate and the bounded delay for interactive video. The errors are propagated from 
reference frames to the following frames up to the next I-frame. On the other hand, 
redundant retransmissions over diverse paths ensures that in the single retransmission 
allowed at least one copy of the packet will be received, preventing the error propaga- 
tion. 



5.2 Effect of Changing the Number of Paths 

We tested the redundant retransmission mechanism with different number of paths 
between the sender and receiver. In all experiments the I-frame update period is equal 
3 seconds. 

We varied the channel average packet loss rate on the primary path, i.e. path 0, 
from 0.05 to 0.3. We represented the independent packet losses for the other paths, 
i.e. paths 1-3, by choosing different channel average packet loss rates 0.01, 0.1 and 
0.2 respectively. As can be seen from Fig. 7, with a single path the quality deteriorates 
at high packet loss rates, due to error propagation. But, with MP-RTP, increasing the 
number of paths between the sender and the receiver, improves the quality due to the 
independent loss characteristics of the paths, which increases the probability that the 
retransmitted packets will be received before their deadline. 
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Fig. 7. Average PSNR versus number of paths. 



5.3 Redundant Retransmission Overhead 

In this experiment, we compared the overhead of MP-RTP, due to the redundant re- 
transmissions and heartbeats, to the overhead of error control mechanisms that depend 
on increasing the I-frame frequency to limit the error propagation. 

We define the overhead ratio to be the total number of bytes sent in 1-frame update 
scheme to the total number of bytes sent in MP-RTP, to attain a given video quality 
represented by the average PSNR. In order to calculate the maximum overhead for 
MP-RTP, we used 3 paths. We varied the channel average packet loss rate for the 
primary path, path 0, while the channel average packet loss rates for the other paths, 
path 1 and path 2, were set to 0.1 and 0.2 respectively. 

Fig. 8 shows the overhead ratio for average PSNR equal 23 dB. As was shown be- 
fore, the single path retransmission case required an I-frame frequency of almost 1 per 
second, while the MP-RTP required 1 per 3 seconds, for a video quality of around 23 
dB. It can be seen from the figure that the overhead of our mechanism is less than that 
for the I-frames update scheme. The reason behind this is that the redundant retrans- 
mission mechanism implemented in MP-RTP is adaptive, in the sense that it only 
adds the retransmission overhead when there is loss in the video stream. In addition, 
the degree of the overhead is proportional to the importance of the lost packets. Al- 
though heartbeat packets are periodically sent, they have less contribution to the over- 
head, as they are small in size compared to the size of video frames. 



Average PSNR= 23 dB 

Path 1 average PLR = 0.1 Path2 average PLR = 0.2 




Average PLR for the Primary path (Path 0) 



Fig. 8. Overhead ratio versus average packet loss rate on the primary path. 
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6 Conclusion 

The nature of video encoded using motion compensation requires higher protection 
for reference frames than dependent frames, otherwise errors due to packet losses in 
reference frames propagate to dependent frames. Interactive video complicates the 
problem by bounding the time available for the error control. To tackle these prob- 
lems, we propose a mechanism to provide unequal error protection to data within the 
video stream according to their importance to the reconstructed video quality. The 
unequal error protection is realized through extending the classic retransmission based 
error control, with redundant retransmissions on diverse paths, in order to increase the 
probability that at least one of the retransmitted packets arrive at the receiver in less 
number of retransmissions. The degree of redundant retransmission depends on the 
reliability level required for the data within the retransmitted packet. A delayed con- 
strained retransmission, based on the packet lifetime and estimate of the delay from 
the sender to receiver, is used to prevent re-transmitting expired packets. We imple- 
mented the proposed mechanism as an extension to RTP, refereed to as Multi Path - 
RTP (MP-RTP). Performance results show that the mechanism is able to provide a 
good quality for interactive video under different packet loss rates. In addition, com- 
paring the transmission overhead of the mechanism to the overhead of reference 
frame updates error control mechanism, it is shown that for a given video reconstruc- 
tion quality MP-RTP has less overhead, which is an important feature required in 
wireless networks. 



Disclaimer 

The views and conclusions in this document are those of the authors and should not 
be interpreted as representing the official policies, either expressed or implied, of the 
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Abstract. Optimizing wireless bandwidth utilization is one of the numerous 
challenges in wireless IP multimedia systems design. This paper describes and 
evaluates the performance of a novel Application Level Framing protocol for 
efficient transmission of H.264 video over error-prone wireless IP links. The 
proposed ALF protocol introduces an innovative loss spreading scheme for 
video streaming services which is based on (7) a bandwidth-efficient adaptive 
H.264 video fragmentation and («') an unequal-interleaved protection for im- 
proving FEC efficiency. Both video fragmentation and interleaving are coordi- 
nated in a frame-based granularity providing bounded end-to-end delays. Per- 
formance evaluation results show that the proposed protocol allows graceful 
video quality degradation over error-prone wireless links while minimizing the 
overall bandwidth consumption and the end-to-end latency. 



1 Introduction 

Wireless communication technology has gained widespread acceptance in recent 
years. The IEEE 802.11b 1 standard has led wireless local area networks (LANs) into 
greater use, providing up to 1 1 Mbps of shared bandwidth. With such high bandwidth, 
the demand for supporting time-sensitive traffic applications, such as video-on de- 
mand and interactive multimedia, in wireless LANs has been increasing. Meanwhile, 
the recently adopted ITU-T H.264 standard 2 (known also as ISO/IEC International 
Standard 14496 Part 10) achieves efficient video encoding and bandwidth savings. 
H.264 experts have taken into account transmission over packet based networks in the 
video codec design from the very beginning. The overall performance of H.264 is as 
such that bit rate savings of 50% or more, compared to the current state of technology, 
are reported. Digital Satellite TV quality, for example, was reported to be achievable 
at 1.5 Mbit/s, compared to the current operation point of MPEG-2 video at around 3.5 
Mbit/s. In this paper, we investigate H.264 video multicast communications over 
IEEE 802.11b wireless LAN. Though the proposed protocol is network independent 
and can support various media types as well. 

In previous work 3, 4, we have addressed wireless video communication issue 
from an application point of view. Thus, we proposed a multimedia elementary 
streams classification and aggregation that provides wireless bandwidth savings and 
packet loss tolerance. However, the intrinsic wireless link characteristics involve 
unpredictable burst errors that are usually uncorrelated with the instantaneous avail- 
able bandwidth. The resulting packet losses and bit errors can have devastating effects 
on multimedia quality. To overcome residual BER (Bit Error Rate), error control 
mechanisms of video streams is generally required. Error control mechanisms are 



J. Vicente and D. Hutchison (Eds.): MMNS 2004, LNCS 3271, pp. 13-25, 2004. 
© IFIP International Federation for Information Processing 2004 



14 Abdelhamid Nafaa et al. 



popular on dealing with packet loss and delay over bandwidth limited fading wireless 
channels. Such mechanisms involve Forward Error Correction (FEC), Automatic 
Retransmission ReQuest (ARQ), and error resilience tools. FEC has been commonly 
suggested for real-time applications due to (i) its proven scalability for multicast 
communications and (ii) the strict delay requirements of media streams. 

Typical communications over wireless networks involve high bit error rates that 
translate to correlated adjacent packets losses. In this case, the classical adaptive FEC 
approaches 5, 6, can be inefficient since they involve an excessive bandwidth usage. 
Actually, such approaches use FEC to protect consecutive original data packets, 
which reduce its effectiveness against burst packets losses. This often implies trans- 
mitting additional FEC packets to overcome the increasing BER. Hence, we propose a 
novel low-delay interleaved FEC protection scheme. The idea is to spread the burst 
loss before FEC recovering, so that the burst loss manifests itself as a number of dis- 
joint packet losses in the FEC-recovered data stream. This process first consists of 
adaptively fragmenting the H.264 Frames in order to achieve a better link utilization. 
The second phase is based on the application of an unequal-interleaved media packet 
protection, which takes into account the H.264 video Frames relevance. Thus, our 
protocol minimizes burst errors consequences, as well as the video distortion at re- 
ceivers’ side, while minimizing the overall bandwidth consumption. 

The remainder of this paper is as follows. Section 2 investigates reliable H.264 
video multicasting over Wireless LAN. Then, the proposed protocol is presented in 
Section 3. Section 4 is devoted to the performance evaluation. Finally, Section 5 con- 
cludes the paper. 

2 Related Works on H.264 Streaming over Wireless LAN 

2.1 H.264 over Wireless IP Networks 

A new feature of H.264 design 7 resides in the introduction of a conceptual separation 
between Video Coding Layer (VCL), which provides the core high-compression rep- 
resentation of the video picture content, and Network Adaptation Layer (NAL), which 
packages that representation for efficient delivery over a particular type of network. 

The H.264 NAL design provides the ability to customize the format of the VCL 
data for delivery over a variety of particular networks. Therefore, a unique packet- 
based interface between the VCL and the NAL is defined. The packetization and 
appropriate signaling are part of the NAL specification, which is not necessarily part 
of the H.264 specification itself. For the transmission of video over WLANs with 
limited bandwidth and transmission power resources, the necessity for high compres- 
sion efficiency is an obvious task. Besides, adaptation of the video data to the network 
fluctuation is an additional important task due to special properties of the wireless 
channel. These two design goals, compression efficiency and network friendliness, 
motivate the differentiation between the VCL for coding efficiency and the NAL to 
take care of network issues. In the H.264 framework, all information that was tradi- 
tionally conveyed in sequences, group-of-picture, or picture headers is conveyed out 
of band. During the setup of the logical channel the capability exchange takes place. 
This procedure was already subject to many discussions within H.264, and it was 
agreed that a simple version/profile/level concept should be used; current work in the 
IETF 9 is underway to enable such features. 
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2.2 Reliable Multicast Communications over Wireless LAN 

In order to reliably communicate over packet-erasure channels, it is necessary to exert 
some form of error control 8. Two classes of communication protocols are used in 
practice to reliably communicate data over packet networks: synchronous and asyn- 
chronous. Asynchronous communication protocols, such as ARQ operates by dividing 
the data into packets and appending a special error check sequence to each packet for 
error detection purposes. The receiver decides whether a transmission error occurred 
by calculating the check sequence. For each intact data packet received in the forward 
channel, the receiver sends back an acknowledgement. While this model works very 
well for data communication, it is not suitable for multimedia streams with hard la- 
tency constraints. The maximum delay of the ARQ mechanism is unbounded, and in 
the case of live streaming it is necessary to interpolate late-arriving or missing data 
rather than insert a delay in the stream playback. In synchronous protocols (i.e. FEC- 
based protocols), the data are transmitted with a bounded delay but generally not in a 
channel adaptive manner. The FEC codes are designed to protect data against channel 
losses by introducing parity packets. No feedback channel is required. If the number 
of lost packets is less than the decoding threshold for the FEC code, the original data 
can be recovered perfectly. 

ARQ-based schemes are not appropriate for the multicast case for three reasons: (;) 
ACK explosion, that scales with the multicast group size; (ii) for significant loss rate, 
each user will require frequent packet retransmissions that are probably useless for the 
other multicast clients; and (Hi) unbounded data transmission delays. Hence, using a 
FEC -based error control seems to be more appropriate for real time multicast com- 
munication. 

The IEEE 802.11 standard 1 uses the same logical link layer as other 802-series 
networks (including the 802.3 wired Ethernet standard), and uses compatible 48-bit 
hardware Ethernet addresses to simplify routing between wired and wireless net- 
works. As in the wired Ethernet, corrupted packets are dropped at the link layer (i.e. 
the packets with bit errors are unavailable to a multimedia application). The commu- 
nication is complicated by the inability of radio transceivers to detect collisions as 
they transmit, and the potential for devices outside the network to interfere with net- 
work transmissions. Communication is also hampered by the hidden node problem. 
Therefore, the IEEE 802.11b standard uses a complex MAC protocol to cope with 
these wireless communication specificities. The basic medium access protocol is a 
DCF (Distributed Coordination Function) that allows medium sharing through the use 
of CSMA/CA (Carrier Sense Medium Access / Collision Avoidance). In addition, all 
directed traffic uses immediate positive acknowledgment (ACK frame) where re- 
transmission is scheduled by the sender if no ACK is received. That is, within IEEE 
802.11b unicast communications, all wireless data frames are acknowledged by the 
receiver. Furthermore, each retransmission introduces an additional latency due to 
triggering of the collision avoidance routine. The sender uses limited retransmission 
persistence, so the data can be dropped at the source after several retransmission at- 
tempts. In case of multicast or broadcast traffic, however, the data packets are not 
acknowledged, and hence no retransmission is performed on the MAC/Logical link 
layer; this mode of communication reduces transmissions delays while making com- 
munications less reliable. 
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2.3 Specific Related Works 

Nowadays, most of the reliable multicast video distribution protocols propose the use 
of ARQ (see 10 and references therein). Besides, FEC for multicast streaming of 
different characteristic have been extensively studied in the literature 6, 1 2. In a mul- 
ticast scenario, to tackle the problem of heterogeneity and to ensure graceful quality 
degradation, the use of multi resolution-based scalable bitstreams has been previously 
suggested in 10 and 13. These approaches are, however, dedicated to multilayer video 
coding throughout their design. 

In wireless communications, packet loss can exhibit temporal dependency or 
burstiness. For instance, if packet n is lost, packet n + 1 is also likely to do so. This 
translates to burstiness in network losses, which may worsen the perceptual quality 
compared to random losses at the same average loss rate. As a consequence, the per- 
formance of FEC is affected, e.g., percentage of packets that cannot be recovered. 
Moreover, the final loss pattern after FEC recovering could be even burstier due to the 
dependency between losses, which affects audio/video quality and effectiveness of 
loss concealment. In order to reduce burst loss, redundant information has to be added 
into temporally distant packets, which introduces even higher delay. Hence, the repair 
capability of FEC is limited by the delay budget 14. Another sender-based loss recov- 
ery technique, interleaving, which does not increase the data rate of transmission, also 
faces the same dilemma. The efficiency of loss recovery depends on over how many 
packets the source packet is interleaved and spread. Again, the wider the spread, the 
higher the introduced delay. 

In this paper, we introduce a novel technique that combines reliability and effi- 
ciency advantages of both interleaving and FEC coding. Our proposal is based on an 
adaptive H.264 streams fragmentation coordinated with an unequal-interleaved pro- 
tection scheme. Thus, we improve the error resiliency while minimizing the overall 
bandwidth consumption and still meeting the delay constraints. 

3 Unequal Interleaved FEC Protocol 
for Reliable Wireless Video Multicasting 

3.1 Adaptive H.264 Video Fragmentation and Packetization 

Basically, mobile devices are hand-held and constrained in processing power. In addi- 
tion, the mobile environment is characterized by harsh transmission conditions in 
terms of fading and multi-user interference, which results in time- and location- 
varying channel conditions. Therefore, a mobile video codec design must minimize 
terminal complexity while still remaining efficient. Consequently, in our work we 
specially focus on H.264 Baseline Profile to reduce the receiver’s decoder complex- 
ity. In this H.264 coder release the data partitioning features are not enabled. 

Internally, the NAL uses NALU (NAF Units) 9. A NAF unit consists of a one-byte 
header and the payload byte string. A complete separation between the VCF and the 
NAF is difficult to obtain because some dependencies exist. The packetization proc- 
ess is an example: error resilience, in fact, is improved if the VCL is instructed to 
create slices of about the same size of the packets and the NAF told to put only one 
slice per packet. The error resilience characteristics of the transmission will take profit 
of it because all the data contained in a certain packet can be decoded independently 
from the others. Note that in H.264, the subdivision of a Frame into slices has not to 
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be the same for each Frame of the sequence; thus the decoder can flexibly decide how 
to make the slices. However, they should not be too short because a decrease of the 
compression ratio would occur for two reasons, i.e. the slice headers would reduce the 
available bandwidth and the context-based entropy coding would become less effi- 
cient. Moreover, in wireless channels, the size of transmitted packets influences its 
error probability; longer packets are more likely to contain transmission errors 1516. 
In this paper the trade-off involved in the packet creation process will be investigated, 
studying the performances of the video transmission as a function of the packet size. 
Actually, we try to find, for each Frame to be transmitted, the optimal slice size that 
maximizes the bandwidth utilization taking into account the fluctuating wireless link 
conditions (it is assumed that a NALU corresponds to a Slice). 

Let FRAMEsize be the Frame size, S the packet size in bytes, oh the header size in 
bytes and Lr the wireless channel loss rate. The link utilization, U, is then given by 
(1). U represents the original video data over the transmitted data (i.e. including the 
FEC redundancy). 

FRAMEsize 

FRAMEsize Lr .FRAMEsize 
S - oh S — oh 

Where, NALUsize=S -oh and oh + \<S<MTU In Fig. 1, C/ is plotted against S for 
FRAMESize=l0000 and oh = 40 at four different loss rates (i.e. Lr - {0, 0.05, 0.2 and 
0.3}). 



(1) 



Correlation between packet size and link utlization 




Packet size (bytes) 



Fig. 1 . Correlation between packet size and link utilization. 



Fig. 1 depicts the wireless channel utilization for different packet loss rates. It is 
quite obvious that a small packet size provides better performance when the channel 
loss rate is too high. In the other hand, systematically choosing a small packet size 
does not necessarily give good channel utilization. Now, it is readily realized that the 
maximum of the utilization function is obtained by resolving the equation for S. Thus, 
NALUSize is determined for each Frame to be transmitted based on the measured loss 
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rate. The fragmentation/encapsulation presented here, provides better wireless link 
utilization. Moreover, this scheme minimizes the dependency between adjacent RTP 
packets, which mitigate the dependency of H.264 decoder on any lost packet. 

3.2 Unequal Interleaved FEC Protection Protocol 

Fig. 2 illustrates a sample trace we obtained using an 802.11 AP (Access Point) and 
wireless receivers. Each point represents the packet loss rate measured at each re- 
ceiver; we emphasize the channel behavior when multicasting video over wireless 
links. The AP sent multicast H.264 video packets and the receiving stations recorded 
the sequence number of the correctly received packets. It should be noted that this 
experience reveals an important number of adjacent packet losses as a consequence of 
the wireless burst errors. Furthermore, the packet loss rate is different at each receiver 
due to location-varying channel conditions. 



Multicast receivers heterogeneous losses 




Fig. 2. Correlated multicast receivers packet losses. 



Since the wireless link fluctuation occurs usually through unpredictable adjacent 
packets losses, we propose to use an interleaved FEC protection. As depicted in 
Fig. 3, the redundant FEC packets protect temporally scattered RTP packets in order 
to cope with wireless sporadic packet losses. This increases the FEC efficiency 
through improving the error resiliency at clients’ side. 



Burst 

nm dXD c® uzd n i 



I RTP media 
J Packets 



n FEC I 



I I FEC I 



FEC 

Packets 



Fig. 3. Scattered video packets protection. 
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Within UI-FEC, the time is divided into transmission rounds. A transmission round 
ends when the sender transmits the last packet of a Frame. Each Frame is divided into 
several UBs (Unequal loss protection Block). An UB consists of n = k + h packets 
(see Fig. 5). At this point, we define the interleaving factor (i) as the Sequence Num- 
ber (SN) difference between each two successive protected RTP packets in the UB. 
The interleaving factor is fixed for the k protected RTP packets belonging to the same 
UB (see Eq.2). When i = 1, the interleaved protection is not applied and, hence, the 
FEC packets protect consecutive RTP packets. Fig. 4 summarizes the UI-FEC proto- 
col working example for an interleaving i = 3. In this case, the adjacent video packets 
(in terms of transmission order) are protected in different FEC blocks. Here, the inter- 
leaving factor represents the interleaving stride 1 . 




Fig. 4. UI-FEC Protocol. 

Note that for a given Frame (F), the interleaving factor (i) represents the number of 
UBs constituting F. Moreover, i is fixed for all UBs belonging to the same Frame. For 
synchronization consideration the protected media data of a given UB must belong to 
the same Frame. In other words, each Frame is transmitted as an integer number of 
UBs. After applying FEC, the media packets are transmitted in their initial order (i.e. 
according to the sequence number order). Note that the delays introduced by the FEC 
interleaving do not have important consequences, since the interleaving is applied 
over a single Frame. The induced delays can be resolved through an initial buffering 
time. This novel interleaving schema over a single Frame is used to provide an adap- 
tive and unequal FEC protection. 



1 By interleaving stride, we mean the separation (in terms of packet transmission order) be- 
tween two consecutive data packets in the same FEC block. This is useful in spreading the 
burst loss, so that the burst loss manifests itself as a number of disjoint packet losses in the 
FEC-recovered data stream. 
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Fig. 5. UB structure details. 



For FEC efficiency 6, it is suitable to keep the number of information packets (k) 
as high as possible with few redundant packets. Actually, a higher number of informa- 
tion packets leads to better granularity, which allows adjusting the redundancy rate 
more precisely according to wireless channel conditions. As an example, if a Frame to 
be transmitted has a size of 10 Kbytes, a packet payload size ( NALUSize ) of 1 Kbyte 
will result in 10 RTP packets. This makes the precision at which the error protection 
can be applied in units of 10%. 

In our study, we take a minimum of 10 information packets for each UB (i.e. guar- 
antying a minimum precision of 10%). This is insured using an appropriate 1 > 10 
taking into account the mean Frame size (see formula (2)). For a fixed 1, both inter- 
leaving factor (/) and the number (k) of media packet per UB are then easily stated; 
where the FRAMESize is obtained from the Frame header, while the NALUSize is 
fixed by our H.264 video packetization process (see section 3.1). 



k = 



FRAMESize 
i* NALUSize 



, where, i 



FRAMESize 
l* NALUSize 



( 2 ) 



It is clear that the interleaving factor is tightly dependent on Frame size. Since the 
different Frames (pictures) of a coded H.264 video sequence have a different size, the 
interleaving factor (i) scales along with the Frame size. Consider that an intra-coded 
picture (picture-I) is larger than an inter-coded picture (picture-P or picture-B). Basi- 
cally, the large inter-coded picture conveys a lot of motion vectors and error predic- 
tion information. Otherwise, the larger the inter-coded picture is, the more it codes a 
highly changeful scene with different texture, and the more it involves an important 
video distortion when lost. As a consequence, within our UI-FEC, the interleaving 
protection is applied differently based on Frame size and hence Frame relevance, 
which provides a better loss recovery for the most sensitive H.264 bitstream. Thus, 
the different Frames are interleaved unequally, protected, then transmitted based on 
their relevance and without using any previous video stream classification scheme. 

The adaptive FEC scheme proposed in this paper is based on systematic RS(n, k) 
codes, where n, k and i are reassigned for each Frame to be transmitted based on the 
relevant priority of the Frame and the current loss rate of the communication channel. 
When using k source data in an UB, transmitting h FEC packets provides an erasure 
resiliency against a packet loss rate of h/n. Therefore, it is easy to calculate the 
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amount of redundant packets (h) using the packet loss rate (p) and the number ( k ) of 
original media packets of current UB to be transmitted (see formula (3)). 



6 = 



p-i 



i- v 



(3) 



The amount of redundancy introduced is computed once per each Frame to be 
transmitted depending on (i) current transport-level packet loss rate (i.e. measured 
before FEC recovering) and (ii) video fragmentation parameters. This is achieved 
according to (1 ) and (3). 



3.3 UI-FEC Signaling 

In this section, we highlight the key components involved by UI-FEC deployments. 
Since the successive UBs will have different structure, we propose a simple signaling 
of required parameters for UB decoding (see Fig. 6). We transmit together within 
FEC stream (i.e. FEC header) the base RTP sequence number ( BSeq ) of the protected 
RTP packets, the UB size (n), the protected original data (k), and the interleaving 
factor ( i) . Thus, both interleaving and redundancy are adjusted to match the overall 
measured BER. 



RTP 


FEC 


FEC data 


header 


header 


(Heed Solomon redundancy) 







BSeq | n | k | / 



Fig. 6. FEC parameters signaling. 



This signaling is sufficiently flexible to provide an adaptive error control based on 
the signaled FEC interleaving parameters. In addition, the original media stream is 
decoupled from the FEC stream and its associated parameters, which allows clients 
without FEC capabilities to decode the original media stream. 



4 Performance Evaluation 

UI-FEC is evaluated with Network Simulator v2 using different network configura- 
tions. We emphasize the robustness and efficiency of our proposal with the classical 
FEC protection (i.e. non interleaved FEC protection) for wireless multimedia com- 
munications. 



4.1 Simulation Model 

Video multicasting applications are considered for the simulation. With our proposed 
technique, the H.264 multicast server generates FEC packets that protect interleaved 
RTP media packets, while, with classical approach, FEC protection is applied to con- 
secutive RTP media packets. It should be noted that both approaches are evaluated for 
the same network configurations and using the same amount of FEC protection as 
well. In our simulation, we use a QCIF Foreman H.264 coded sequence with a con- 
stant quantization parameter of 10 (sequence parameters are depicted in Table 1); the 
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video sequence was generated using the current release of the TML software, JM80. 
We choose a highly changeful video sequence in order to highlight the unequal- 
interleaved protection efficiency. 



Table 1. Source Video Statistics. 



Original 

video 


Video 

configuration 


Average 

bit-rate 


Frame 

frequence 


Foreman 
(13.33 seconds) 


H.264 

Baseline Profile 


1 783 Kb/s 


30 Frame/s 



We use the network simulation models depicted in Fig. 7 for evaluating and com- 
paring our proposal with the classical approach. The MPEG-4 sender attached to the 
node “1” transmits a multicast MPEG-4 stream to the wireless receivers “5” and “7”. 
We include constant-bit-rate (CBR) traffic over UDP to allow loading the network 
differently each time in order to get further information about UI-FEC behavior. 




Fig. 7. Network model. 



4.2 Results Analysis 

Fig. 8 represents the final packet loss rates measured for each received H.264 Frame 
after recovering with FEC; it reveals relatively high bit error rates due to (1) wireless 
channel burst errors and (2) absence of MAC-level retransmissions in multicast com- 
munications. Consider that the measured high loss rates are often provoked by tempo- 
rally consecutive packet losses affecting the same Frame. We observe that for the 
same network conditions, UI-FEC increases error resiliency at receivers’ side through 
recovering more RTP packets. 

Fig. 9 illustrates the bandwidth consumption measured during the simulation (i.e. 
for 400 Frames). UI-FEC is more bandwidth efficient than the classical FEC; the 
measured mean bandwidth saving is around 76 Kbps. This bandwidth saving is prin- 
cipally due to a better error resiliency, which implies a reduced FEC transmission. We 
observed that the unequal-interleaved FEC scheme provides better robustness in net- 
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works with a high BER and, consequently, a likely small MTU size. Moreover, UI- 
FEC behaves better when transporting high bit-rate video streams (i.e. video streams 
with a large mean Frame size). 




o 2 4 6 8 10 12 



Time (seconds) 
Fig. 8. Instantaneous loss rates. 



Bandwidth consumption (classical FEC vs. UI-FEC) 




Fig. 9. Instantaneous bandwidth consumption. 

We experienced 4 dropped Frames that can not be decoded when transmitting 
video protected by the classical adaptive FEC, whereas we were able to decode the 
whole video sequence (400 Frames) through using UI-FEC protocol. Fig. 10 depicts 
the objective quality measurements of reconstructed video streams when transmitted 
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and protected using both FEC techniques. It is clear that UI-FEC protection achieves 
smoother video quality degradation than a classical FEC protection; the average 
PSNR gain over the whole communication is around 2.18 dB. 



PSNR measurements (UI-FEC vs. classical FEC) 



Coded Foreman video 
Transmitted with classical FEC 
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Fig. 10. Objective video quality measurements. 



5 Conclusion 

In this paper we designed, implemented and tested a new Application Framing Proto- 
col named UI-FEC to cope with the problem of efficiently stream real-time com- 
pressed video over error-prone wireless links. A distinct feature of UI-FEC is the 
efficient wireless bandwidth management by means of an adaptive video stream 
fragmentation coupled with an unequal-interleaved FEC protection. UI-FEC perform- 
ance was evaluated by simulation with an H.264 multicast distribution service. Re- 
sults analysis show that UI-FEC offers considerable gains over conventional adaptive 
FEC in effectively protecting sensitive H.264 video frames, and consequently improv- 
ing both bandwidth utilization and end-user perceived video quality. 
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Abstract. Real-time video streaming with rate adaptation to network 
load/congestion represents an efficient solution to its coexistence with 
conventional TCP data services. Naturally, the streaming rate control 
must be efficient, smooth and TCP friendly. As multimedia clients be- 
come mobile, these properties must be preserved also over wireless links. 
In particular, they must be robust to random wireless losses. Existing 
schemes such as TCP Friendly Rate Control (TFRC) perform well in the 
wired Internet, but show serious performance degradation in the pres- 
ence of random wireless losses. In this paper we introduce the Video 
Transport Protocol (VTP) with a new rate control mechanism based on 
the Achieved Rate (AR) estimation and Loss Discrimination Algorithm. 
We show that VTP can preserve efficiency without causing additional 
performance degradation to TCP, in both error-free and error-prone sit- 
uations. 



1 Introduction 

Real-time video streaming is becoming increasingly important on the Internet. 
Unlike conventional applications, real-time streaming generally requires a mini- 
mum, continuous bandwidth guarantee as well as stringent bounds on delays and 
jitters. Earlier work largely relied on the unresponsive UDP traffic and imposed 
potential menace to network stability. Thus the more recent research is focused 
on adaptive schemes that respond to the network dynamics and avoid possible 
congestion collapses. 

TCP, the dominant transport protocol on the Internet, has also been consid- 
ered for streaming [11]. However, the instantaneous sending rate of TCP changes 
drastically such that buffering is needed at the receiver to accommodate rate 
fluctuations [14]. Buffering smoothes the playback rate but also brings up two 
concerns. First, it causes a startup delay. For Video-on-Demand (VoD) appli- 
cations, startup delays of a few seconds or slightly longer are tolerable, but for 
real-time, interactive applications, e.g. video conferencing and online gaming, 
startup delays have to be tightly bounded [16]. The second concern is that more 
and more mobile/ wireless devices are connected to the Internet. These devices 
are often small and inexpensive with limited computation and buffer capacities; 
storing a large amount of data is simply impractical. 



J. Vicente and D. Hutchison (Eds.): MMNS 2004, LNCS 3271, pp. 26—38, 2004. 
(c) IFIP International Federation for Information Processing 2004 
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To address the concerns, real-time streaming needs more intelligent rate adap- 
tation or rate control mechanisms. Solutions are usually based on two types of 
feedback: a) cross-layer feedback from lower layers [9], or b) end-to-end feedback. 
On the Internet, cross-layer approaches require modifications on both end hosts 
and intermediate nodes, which is not practical, thus end-to-end rate control has 
been the preferred choice [3] [7]. 

TCP Friendly Rate Control (TFRC) [7] is one of the most popular end-to- 
end streaming protocols and often used as the reference and benchmark. TFRC 
attempts to match the long-term throughput of legacy TCP (e.g. Reno) and is 
smooth, fair and TCP friendly in wired networks. However, with the increasing 
popularity of wireless Internet terminals and the demand for delivering multi- 
media to mobile users, it is necessary for streaming protocols to work efficiently 
also on wireless links, withstanding the high random wireless errors. Legacy 
TCP does not work well in this case; it tends to over-cut its window, leading to 
a severely degraded performance. Since TFRC attempts to faithfully match the 
throughput of TCP, it suffers the same low efficiency in the presence of moderate 
to high random errors [18]. 

Our goal is to develop a real-time streaming protocol that behaves well in the 
wired Internet, and moreover is robust to random errors and can be deployed 
with wireless links. We have proposed the Video Transport Protocol (VTP) [2], 
which measures the Achieved Rate (AR) and adapts its sending rate according 
to the network dynamics. However, we have recently found that the original VTP 
tends to be unfriendly to TCP in some scenarios. The main contribution of this 
paper is to refine the VTP rate control. The new mechanism should provide 
efficient and smooth rate control in both error-prone and error- free situations, 
while maintaining fairness and friendliness to coexisting flows. 

The rest of the paper is organized as follows: Section 2 lists our design goals 
of the VTP rate control. The Achieved Rate (AR) estimation and Loss Discrim- 
ination Algorithm (LDA) are introduced in Section 3, followed by the VTP rate 
control mechanism in Section 4. We evaluate the performance of VTP in the 
Ns-2 simulator in Section 5. Related work is summarized in Section 6 and finally 
Section 7 concludes the paper. 

2 Design Goals 

In this section we discuss the main design goals of the VTP rate control mech- 
anism, namely robustness to random errors and TCP friendliness. 



2.1 Robustness to Random Errors 

As the Internet evolves into a mixed wired-cum- wireless environment, more and 
more devices are interconnected via wireless technologies. Wireless links are usu- 
ally error-prone due to interference, noise, fading, mobility, etc. [13]. However, 
popular error recovery techniques, such as Automatic Repeat reQuest (ARQ) 
and Forward Error Correction (FEC), may not completely solve this problem. 
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First of all, ARQ increases both the end-to-end delay and its variance, which 
is undesirable for real-time streaming. Applying ARQ in a single FIFO queue, 
as performed in the majority of commercial MAC layer implementations, also 
introduces the problem of lread-of-of-line blocking, where retransmission of a 
packet forces subsequent packets in the same queue to wait. On the other side, 
FEC is more effective when errors are sporadic. In practice, errors are usually 
bursty due to the interference by external sources. In conclusion, after apply- 
ing limited ARQ/FEC where appropriate, packet error rates of a few percent 
or higher are still expected in wireless networks [17]. This is the key working 
assumption that motivates the rest of the paper. The first design goal of VTP is 
to provide efficient streaming rate control in presence of random wireless errors. 



2.2 TCP Friendliness 

TCP is deployed virtually on every computer. Years of operation have proved 
that the well-designed congestion control in TCP contributes significantly to the 
stability of the Internet. New protocols must be TCP friendly to avoid potential 
congestion collapses. 

Different definitions of “TCP friendliness” exist in the literature. A widely 
used one is based on Jain’s fairness index [8], which belongs to the class of 
max-min fairness. Applying Jain’s fairness index to TCP friendliness results in 
a statement like “a flow of the new protocol under evaluation must achieve a 
rate similar to the rate achieved by a TCP (usually Reno/NewReno) flow that 
observes the same round-trip time (RTT) and packet loss rate”. VTP must com- 
ply with this definition in the region where TCP performs efficiently (i.e., with 
zero random errors) and can potentially use the entire bandwidth. In the case 
of frequent random errors, however, legacy TCP cannot achieve full bandwidth 
utilization. Thus the conventional definition of friendliness must be modified 
to allow a new, more efficient protocol to opportunistically exploit the unused 
bandwidth, even beyond the “fair share”. 

In this paper we introduce the notion of opportunistic friendliness to refer 
to the ability of a new flow to use the bandwidth that would be left unused by 
legacy flows. More precisely, a new protocol NP is said to be opportunistically 
friendly to legacy TCP if TCP flows obtain no less throughput when coexisting 
with NP, compared to the throughput that they would achieve if all flows were 
TCP (i.e., NP flows replaced by TCP). The second design goal of VTP is to 
have opportunistic friendliness to legacy TCP. 



3 Achieved Rate and Loss Discrimination Algorithm 

3.1 Achieved Rate 

The Achieved Rate (AR), together with the Loss Discrimination Algorithm 
(LDA) that will be introduced shortly, are two important components in VTP. 
AR is the rate that the sender succeeds in pushing through the bottleneck. This 
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is the rate that the receiver can measure, plus the fraction corresponding to 
packet losses at the exit of the bottleneck due to random errors. 

For the time being, let us assume zero errors. The receiver samples and filters 
the receiving rate, using an Exponentially Weighted Moving Average (EWMA). 
AR has an intuitive interpretation. Assuming we start with an empty bottleneck, 
each sender can safely transmit for an unlimited time at AR and expect its 
packets to be delivered to the receiver with no buffer overflow. If the sender 
transmits at a rate higher than AR, there is a chance that the extra packets 
will get buffered at the bottleneck queue. The sender will typically transmit, 
over limited periods of time, at rates higher than AR to probe the bandwidth. 
However, following a packet loss, it will step back and restart at or below AR. 

An AR sample Sk is obtained, by the receiver, as the number of received 
bytes during a time period of T, divided by of T. AR samples are reported back 
to the sender, which updates its smoothed AR value ARk as 

ARk = & ■ ARk - 1 + (1 — cr) • — (S'*, + Sk—i) (1) 

where a is a fraction close to 1. 

The above scheme works well when no random errors are present. If packets 
can get lost at the exit of the bottleneck due to errors, they will not be re- 
ceived and counted by the receiver, although they do have succeeded in squeez- 
ing through the bottleneck. These packets should be included in the sender’s AR 
value. This is done jointly with the LDA. Via the LDA, the VTP sender is able 
to estimate the fraction of packet losses that are error-induced, i.e. , the error 
rate e. The AR sample reported by the receiver is then prorated by 1 + e. 

3.2 Loss Discrimination Algorithm 

The Loss Discrimination Algorithm (LDA) allows VTP to distinguish error losses 
from congestion losses. Intuitively, it suffices to measure the RTT. If RTT is 
close to RTT m i n measured on this connection, we know the bottleneck is not 
congested; the loss must be an error loss. On the contrary, if RTT is quite larger 
than RTT m i n , the loss is likely to be due to congestion. We propose to use the 

Spike [4] scheme as the LDA in VTP. Spike, as illustrated in Figure 1, is an end- 

to-end algorithm based on RTT measurement. A flow enters the spike state if 1) 
it was not in the spike state, and 2) RTT exceeds a threshold B star t ■ Similarly, 
the flow exits the spike state if 1) it was in the spike state, and 2) RTT falls 
below another threshold B en d- B sta rt and B en d are defined as: 

Bstart = RTTmin + Ot ■ ( RTT max — RTT m in) (2) 

Bend, = RTTmin + P ‘ ( RTT m ax — RTT m in) (3) 

where a and /? are adjustable parameters. If a loss occurs when the flow is in the 
spike state, it is believed to be congestion-induced; otherwise it is error-induced. 

We must point out that the above LDA works only if the error-prone link is 
also the bottleneck. If not, flows that share the bottleneck but do not traverse 




30 Guang Yang, Mario Gerla, and Medy Sanadidi 




Fig. 1 . Spike as a loss discrimination algorithm. 



the “error” link will keep the bottleneck loaded and the value of RTT high. 
Other flows that also traverse the error link will suffer extra losses. However, the 
corresponding senders will not be able to classify these losses as error-induced 
due to the consistently high value of RTT. Those latter flows will reduce their 
rates and be “suppressed” by the flows that do not experience random errors. 
Fortunately, in virtually all wireless scenarios the wireless error-prone link is 
also the bottleneck, e.g. a satellite link or last-hop wireless segment. Thus all 
bottlenecked flows are subject to random errors. Our LDA scheme thus applies 
to most wireless situations. 

4 VTP Rate Control 

In this section we present the rate control mechanism in VTP. Similar to the 
Additive Increase in TCP congestion control, VTP linearly probes the band- 
width until congestion is detected. VTP does not perform Multiplicative De- 
crease though; instead it reduces the rate to AR, with extra adjustments required 
to mimic the TCP behavior. 

4.1 TCP Behavior in Terms of Rate 

While most streaming protocols operate on the concept of rate, TCP is window- 
based: a congestion window cwnd is used to control the number of outstanding 
packets. Due to this difference, streaming protocols must first understand the 
TCP behavior, in terms of its instantaneous sending rate rather than the window 
size, in order to achieve TCP friendliness. 

We now consider TCP NewReno operating in congestion avoidance. We ig- 
nore slow start since it has less impact on the steady state performance. We also 
focus on the losses caused by congestion and assume no random errors. Consider 
the topology in Figure 2. C , B and P are the link capacity, queue buffer size and 
round-trip bandwidth-delay product (namely the pipe size), respectively. As- 
suming the buffer size is equal to the pipe size, we have B = P. Cwnd oscillates 
between P and B + P = 2 P as the left diagram in Figure 3 shows. 

Although TCP increases cwnd at the speed of 1 packet/RTT, it does not 
necessarily increase the sending rate, since the extra packets may be buffered 
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Fig. 2. A simple topology with buffer size equal to pipe size. 
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Fig. 3. Congestion window and instantaneous sending rate of TCP NewReno. 



in the queue. With the assumption of B = P in Figure 2, the sender detects 
a packet loss when the queue is full, i.e., cwnd = 2 P. Cwnd is then halved to 
P. Since there are 2 P outstanding packets, the sender must temporarily stop 
sending and wait until P ACKs have been received, before it can resume the 
transmission. Having P outstanding packets means that the queue is drained 
while the pipe is full. Next, cwnd will increase by 1 packet/RTT, allowing the 
sender to transmit an extra packet every RTT. Other than this, TCP is regularly 
transmitting at the rate of the bottleneck capacity C, limited by the arriving 
rate of ACKs (i.e., self-clocked). 

The right diagram in Figure 3 illustrates the instantaneous sending rate of 
TCP as we discussed above. The sending rate of TCP in congestion avoidance, 
in the topology of Figure 2, is C + 1 /RTT. Note that as RTT grows with more 
packets get buffered, the sending rate actually decreases slightly. 

4.2 VTP Rate Control 

As we explained earlier, TCP instantaneous sending rate drops drastically when 
cwnd is cut by half, due to the fact that the sender must wait until half of the 
outstanding packets are drained from the queue/pipe. This rate reduction, as 
shown in Figure 3, can not be implemented as such in VTP. Yet VTP must 
respond to congestion by reducing, on average, its rate in the same way as TCP 
in order to be TCP friendly. The tradeoff is between the amount of rate reduction 
and the length of time this rate is maintained. Simply speaking, VTP may reduce 
the rate by less but keep it longer. 

Figure 4 illustrates the VTP rate control mechanism and compares it to TCP. 
Note that Figure 4 reflects just one of the three cycles in Figure 3. Also, curves 



